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ABSTRACT 

In this paper we present a novel method for robust hyperspec- 
tral image classification using context and rejection. Hyper- 
spectral image classification is generally an ill-posed image 
problem where pixels may belong to unknown classes, and 
obtaining representative and complete training sets is costly. 
Furthermore, the need for high classification accuracies is fre¬ 
quently greater than the need to classify the entire image. 

We approach this problem with a robust classification 
method that combines classification with context with clas¬ 
sification with rejection. A rejection field that will guide the 
rejection is derived from the classification with contextual 
information obtained by using the SegSALSA (Tl algorithm. 
We validate our method in real hyperspectral data and show 
that the performance gains obtained from the rejection fields 
are equivalent to an increase the dimension of the training 
sets. 

Index Terms — Hyperspectral image classification, hid¬ 
den fields, robust classification, classification with rejection. 

1. INTRODUCTION 

Hyperspectral image classification is a challenging problem 
in remote sensing . Due to generally ill-posed nature of hy¬ 
perspectral image segmentation and classification, spatial reg¬ 
ularization is often used {e.g. by promoting piecewise smooth 
classifications) which provides context to the classification. 
However, context alone cannot deal with difficulties arising 
from the existence of pixels belonging to unknown classes, 
unrepresentative and incomplete training sets, or overlapping 
classes. We propose a method that, combined with contextual 
classification, mitigates these difficulties through the inclu¬ 
sion of a reject option, thus achieving robust classification. 
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In applications where classification performance is criti¬ 
cal, performance gains can be obtained at the expense of not 
classifying all the samples. This can be achieved by selec¬ 
tively abstaining from classification in situations where mis- 
classifications are expected. Classification with rejection was 
firstly analyzed in a, where a rejection rule for optimum 
error-reject trade-off was designed for binary classification. 
Whereas the design of systems for classification with rejec¬ 
tion is a rich area (see a and references therein for state of 
the art systems for classification with rejection), the applica¬ 
tion of these systems is rare in pixelwise image classification 
and in hyperspectral image classification. 

In this paper we are interested in combining classification 
with context with classification with rejection to obtain a ro¬ 
bust classification scheme. This means combining the option 
to reject when evidence for a classification is not enough (i.e. 
reject when the classifier is likely to misclassify) with the cues 
that arise from spatial context information (i.e. classification 
under assumption of piecewise smooth labeling). By associ¬ 
ating spatial context with rejection, context cues infiuence the 
decision whether to reject or not (e.g. a sample is less likely 
to be rejected if all the neighboring samples have the same 
label) , and rejection cues infiuence the context (e.g. a sam¬ 
ple is more likely to be rejected if all the neighboring sam¬ 
ples are also rejected). The robust classification idea was ap¬ 
plied to tissue classification in stained microscopy images |[5l, 
where rejection is considered an extra class and Markov ran¬ 
dom fields are used as spatial contextual prior, with signifi¬ 
cant performance improvements. A major drawback of this 
approach is its rigidity with regard to the relative importance 
of rejection: if the amount of desired rejection is changed, the 
context has to be recomputed. 

We propose a robust classification scheme that computes 
the rejection after the context, allowing us to change amount 
of samples rejected on the fly. By using the hidden fields 
resulting from segmentation via the constrained split aug¬ 
mented Lagrangian shrinkage algorithm (SegSALSA) Elia, 
we are able to infer a rejection field that reflects an ordering of 
the image pixels according to the degree of confidence associ- 



ated with the contextual information, thus providing a simple 
and effective way to classify with rejection and context. 

The paper is organized as follows: Section [2| provides 
the background on the contextual classification algorithm 
(SegSALSA) and performance measures for classification 
with rejection. Section [3] introduces the rejection field and 
describes their construction and properties. Section[4|presents 
experimental results and Section [5] concludes the paper. 

2. BACKGROUND 

SegSALSA The SegSALSA algorithm performs a marginal 
maximum a posteriori (MMAP) segmentation through the 
marginalization, on the discrete labels, of a hidden field driv¬ 
ing the probabilities f7l and applies a vectorial total varia¬ 
tion (VTV) prior (a a on the hidden field. This results on 
a convex segmentation formulation that is solved using the 
constraint split augmented Lagrangian shrinkage algorithm 
(SALSA) 

To describe the SegSALSA algorithm, we start by intro¬ 
ducing notation. Let x G represent a n-pixel hyper- 

spectral image with d bands and G represent the fea¬ 
ture vector of the ith image pixel, with S = {1,..., n} a set 
indexing the image pixels. Let C = {1,..., AT} denote the 
set of possible K labels, and y G a labeling of the image 
with Hi ^ C the label of the ith pixel. 

Under a Bayesian perspective, the maximum a posteriori 
(MAP) labeling y is given by 

y = arg maxp(y|x) = arg maxp(x|y)p(y), (1) 

where p(y|x) represents the posterior probability of the la¬ 
beling y given the feature vectors x, p(x|y) the observation 
model, and p{y) the prior probability of the labeling y. 

SegSALSA approaches the segmentation, or labeling, 
problem by introducing a hidden field ||71 z represented by 
sl K X n matrix that, for each pixel i e S, contains the 
hidden random vectors G The joint probability of 

labels y and field z is defined as p(y,z) = p(y|z)p(z), 
withp(y|z) = allowing the expression of the 

joint probability of the features, labels and fields (x, y, z) as 
p(x, y,z) = p(x|y)p(y|z)p(z). With the hidden field and the 
joint probability defined, the marginalization on the discrete 
labels is now possible: 

p(x,z) = n { Z] 

with the MMAP estimate being zmmap = arg min p(x, z). 

By modeling the conditional probability = /c|zi) as 
the kth component of the ith random vector [zi]k, two con¬ 
straints are introduced in the hidden field z: nonnegativity 
constraint (i.e., [zi]k > 0) and sum-to-one constraint (i.e., 
iJcZi = 1). As only the discriminative power of the condi- 
tional probabilities pi := \p{xi\yi = 1,... ,p{xi\yi = K)]'^ 


is relevant to the segmentation problem, we model them with 
the multinomial logistic regression (MLR) and use the logis¬ 
tic regression via splitting and augmented Lagrangian (LOR- 
SAL)|[II1 algorithm to learn the regression weights. 

By dealing with the MMAP problem instead of the MAP, 
the prior is no longer applied on the discrete labels y but on 
the continuous hidden field z. A convex VTV prior ill9i is 
applied on the hidden field leading to promote a smoothness 
along the spatial dimensions of the field, and preservation and 
alignment of discontinuities across the classes. 

From the initial integer optimization problem in ([T]), the 
contextual classification problem is now formulated as a con¬ 
vex optimization problem 

Zmmap = arg min - ( In (pjzi) ] - Inp(z) (2) 

subject to: z > 0, l^z = 1^. 

Based on zmmap, p(y|zMMAp) provides a soft classification, 
and its maximization with respect to y a hard classifica¬ 
tion. The optimization (O is solved with SALSA ifTOl . an 
instance of the alternating direction method of multipliers, in 
0{Kn\ogn) time. 1 

Performance measures for classification with rejection 

To assess the performance of classification systems with re¬ 
jection we use the nonrejected accuracy A, the fraction of 
rejected samples r, and the classification quality Q lIT^ . The 
nonrejected accuracy measures the accuracy on the subset of 
samples that are not rejected, the rejected fraction measures 
how much rejection is performed, and the classification qual¬ 
ity jointly measures how accurate the classification on the 
nonrejected samples is and how inaccurate are classification 
on the rejected samples is. 

Considering S the set of pixel indexes, let IZ denote the 
set of rejected pixels (1Z the set of nonrejected samples) and C 
the set of correctly classified samples (C the set of incorrectly 
classified samples). We define the nonrejected accuracy A as 

, \cn1z\ 

\TZ\ 

This measure, combined with the respective fraction of re¬ 
jected samples, cannot compare directly the behavior of two 
classifiers with rejection with different rejected fractions. 

The classification quality Q is defined as 

^_\cnn\A\cnn\ 

^ “ i5i 

The classification quality measures the proportion of samples 
that are either correctly classified and not rejected or incor¬ 
rectly classified and rejected, relative to the total number of 
samples. 




A classifier with rejection with a classification quality of 
Q when rejecting a fraction of samples r will be equivalent, 
in terms of correct decisions performed, to a classifier with no 
rejection and accuracy numerically equal to Q. The classifi¬ 
cation quality allows us to directly compare the performance 
of classification systems with rejection working at different 
rejected fractions. 

3. REJECTION FIELD 

From the SegSALSA formulation and resulting hidden field 
we can derive a contextual rejection scheme — the rejection 
field. The hidden field z that results from the optimization 
problem (O provides an indication of the degree of confidence 
associated with each label in each pixel. This is, if [Zi]/c > 
[zj]i , we are led to believe that the label I in the jth pixel has a 
smaller degree of confidence associated with the classification 
than the label k in the ith pixel. 

Considering the following labeling 

y = arg max p(y|zMMAp), 
yec^ 

and obtaining the associated maximum probabilities 

[Zy]i = P(yi|zMMAp), (3) 

the probabilities associated with the MMAP labeling, we note 
that the same line of thought of the components of the hidden 
fields as an indication of confidence can be applied to the en¬ 
tire labeling. If [Zyji > [Zy]jf, there is strong evidence that 
a higher degree of confidence exists in the labeling of the ith 
pixel as y^ than in the labeling of the jth pixel as yj . 

We denote the field (O associated with the labeling y 
as rejection field. By sorting z^ we obtain an ordering of the 
samples according to their relative confidence. The selection 
of a fraction of the lowest confidence samples to be rejected 
yields a simple, yet very effective, scheme for rejection. This 
method allows not only to define, a priori, specific values of 
the rejected fraction, but also to change it instantly. Further¬ 
more the optimal value of rejection (the rejected fraction that 
maximizes the classification quality) can be estimated from a 
subset of samples, a validation set. 

The characteristics of the VTV prior used in SegSALSA 
indirectly impose context on the rejection field. As it pro¬ 
motes smooth hidden fields, preservation of discontinuities 
and their alignment among classes, it preserves the disconti¬ 
nuities on the maximum values of the hidden field, and con¬ 
sequently promotes smoothness and preservation of disconti¬ 
nuities on the rejection field. 

The computation of a rejection field and its use as a re¬ 
jection rule is an approximation to the problem of contextual 
rejection approached in O, where a joint optimization on the 
labels and on the reject option is performed. We perform a se¬ 
quential optimization: first an optimization on the labels and 
then a binary optimization on the reject option through the use 


Table 1. Classwise performance measures for classification 
with rejection of the Indian Pines scene (Fig. [T] top row). OA 
correspondes to the accuracy of the SegSALSA classification 
method with no rejection (Fig. fflb), and A corresponds to 
nonrejected accuracy, Q to classification quality, and r to re¬ 
jected fraction from classification with rejection (Fig. \T\c). n 
Is the number of samples per class. 



OA (%) 

A {%) 

Q(%) 

r(%) 

n 

alfafa 

71.74 

0.00 

26.09 

97.83 

46 

corn no-till 

66.67 

76.57 

79.06 

13.94 

1428 

corn min-till 

53.13 

47.33 

43.49 

36.87 

830 

corn clean 

100.00 

100.00 

96.62 

3.38 

237 

grass past. 

77.85 

81.06 

75.78 

13.66 

483 

grass trees 

90.55 

90.83 

90.00 

1.37 

730 

grass mowed 

0.00 

0.00 

0.00 

0.00 

28 

hay 

99.16 

100.00 

97.70 

3.14 

478 

oats 

0.00 

0.00 

100.00 

100.00 

20 

soybean no-till 

72.94 

74.09 

71.81 

7.10 

972 

soybean min-till 

72.38 

88.54 

89.53 

19.67 

2455 

soybean clean 

79.26 

78.50 

69.48 

14.50 

593 

wheat 

86.34 

86.21 

85.37 

0.98 

205 

woods 

74.55 

81.56 

82.29 

9.96 

1265 

bldg. 

66.06 

80.70 

84.20 

18.13 

386 

stone 

32.26 

32.26 

32.26 

0.00 

93 


of a rejection field. Whereas the solution we obtain is an ap¬ 
proximation to the contextual rejection problem (joint mini¬ 
mization), the sequential optimization we perform has a clear 
advantage over the joint optimization approach: the amount 
of rejection can be changed on the fly, whereas on the joint 
optimization approach the context has to be recomputed. 

4. EXPERIMENTAL RESULTS 

We illustrate the performance of our algorithm through the 
robust classification of the AVIRIS Indian Pine scene, and the 
ROSIS Pavia university scene. The Indian Pine scene was ac¬ 
quired with the AVIRIS sensor in NorthWest Indiana (USA), 
being a 145 x 145 pixel hyperspectral image with 200 spec¬ 
tral bands (excluding water absorption bands) containing 16 
not mutually exclusive classes. The Pavia University scene 
was acquired with the ROSIS sensor in Pavia (Italy), being a 
610 X 340 pixel hyperspectral image with 103 spectral bands 
containing 9 not mutually exclusive classes. We model the 
MLR weights with LORSAL and use the SegSALSA algo¬ 
rithm to include context in the classification. 

Figure [T] illustrates the performance gains obtained by 
combining classification with context with classification with 
rejection. Using the rejection field, we are able to change 
the amount of rejected samples on the fly, without need to 
recompute the context. Table [T] shows that the performance 
gains are not equally distributed among all classes. The bulk 
of the performance gains is achieved by increasing the per- 
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Fig. 1. Top row: Robust classification of the Indian Pines scene, (a) Ground truth and (b) classification with 15 training 
samples per class using LORSAL and SegSALSA (73.5% accuracy), (c) classification with optimal rejected fraction (80.9% 
nonrejected accuracy at a rejected fraction of 14.7% with classification quality of 79.2%), and (d) associated rejection fields, (e) 
Nonrejected accuracy and classification quality variation with rejected fraction (maximum classification quality in red). Bottom 
row: Robust classification of the Pavia University scene, (f) Ground truth and (g) classification with 15 training samples per 
class using LORSAL and SegSALSA (69.8% accuracy), (h) classification with optimal rejected fraction (74.6% nonrejected 
accuracy at a rejected fraction of 12.9% with classification quality of 73.0%), and (i) associated rejection fields, (j) Nonrejected 
accuracy and classification quality variation with rejected fraction (maximum classification quality in red). 


formance in highly populated classes. This is achieved either 
by a minor drop in nonrejected accuracy in small number of 
lesser populated classes, or by the entire rejection of lesser 
populated class. 


The performance gains obtained from the allocation of la¬ 
beled samples to estimate the optimal rejected fraction (the re¬ 
jected fraction that maximizes the classification quality) can 
be larger than the gains obtained from using those samples 
to extend the training set, retraining with LORSAL and clas¬ 
sifying the image with SegSALSA. This effect is clearly il¬ 
lustrated on table [2l where, in the Indian Pines scene, for an 
initial training set of 30 samples the class, the effect of either 
estimating the optimal rejected fraction from 50 randomly se¬ 
lected samples or retraining the classifier with the extra 50 
samples is shown. Whereas it is clear that the increased per¬ 
formance obtained by estimating the rejected fraction when 
compared to retraining the classifier will not hold for smaller 
training sets, for larger training sets it is a computationally 
cheaper and performance-wise better alternative to retraining 
the classifier. 


Table 2. Effect of increasing the dimension of the training 
set with new samples using the new samples as valida¬ 
tion set to estimate the rejected fraction r in the Indian Pines 
scene. Comparison of average performance (classification 
quality Q, nonrejected accuracy A, and rejected fraction r) 
over 30 Monte Carlo runs. 



r(%) 

Q(%) 

A(%) 

initial - training set of 480 samples with 
no rejection 

0.00 

84.21 

84.21 

extended - training set of 480 + 50 sam¬ 
ples with no rejection 

0.00 

86.46 

86.46 

estimated - training set of 480 samples, 
with optimal rejected fraction estimated 
from 50 samples 

12.77 

87.02 

91.16 

optimal - training set of 480 samples, 
with true optimal rejected fraction 

12.49 

88.37 

91.53 


5. CONCLUDING REMARKS 

We presented a simple and effective scheme for robust hyper- 
spectral image classification by combining classification with 
context and classification with rejection by deriving a rejec¬ 
tion field from the hidden fields that drive the contextual clas- 















sification. We moved from the joint optimization problem of 
context and rejection, to a faster separate optimization with¬ 
out losing the contextual effect on the rejection. The perfor¬ 
mance gains obtained by using robust classification are shown 
to be equivalent to training the classifier with larger training 
sets. 
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